Central Nodes in Protein Interaction Networks Drive Critical Functions in Transforming Growth Factor Beta-1 Stimulated Kidney Cells.

OBJECTIVE
Despite the huge efforts, chronic kidney disease (CKD) remains as an unsolved problem in medicine. Many studies have shown a central role for transforming growth factor beta-1 (TGFβ-1) and its downstream signaling cascades in the pathogenesis of CKD. In this study, we have reanalyzed a microarray dataset to recognize critical signaling pathways controlled by TGFβ-1.


MATERIALS AND METHODS
This study is a bioinformatics reanalysis for a microarray data. The GSE23338 dataset was downloaded from the gene expression omnibus (GEO) database which assesses the mRNA expression profile of TGFβ-1 treated human kidney cells after 24 and 48 hours incubation. The protein interaction networks for differentially expressed (DE) genes in both time points were constructed and enriched. In addition, by network topology analysis, genes with high centrality were identified and then pathway enrichment analysis was performed with either the total network genes or with the central nodes.


RESULTS
We found 110 and 170 genes differentially expressed in the time points 24 and 48 hours, respectively. As the genes in each time point had few interactions, the networks were enriched by adding previously known genes interacting with the differentially expressed ones. In terms of degree, betweenness, and closeness centrality parameters 62 and 60 nodes were considered to be central in the enriched networks of 24 hours and 48 hours treatment, respectively. Pathway enrichment analysis with the central nodes was more informative than those with all network nodes or even initial DE genes, revealing key signaling pathways.


CONCLUSION
We here introduced a method for the analysis of microarray data that integrates the expression pattern of genes with their topological properties in protein interaction networks. This holistic novel approach allows extracting knowledge from raw bulk omics data.


Introduction
Chronic kidney disease (CKD) is a public health problem and a leading cause of death. Despite using current therapies to slow progression of CKD, respective patients are still reaching the end stage renal disease (ESRD) at alarming proportions (1). The histological feature of this debilitating disor-der is excessive deposition of extra-cellular matrix (ECM) defined as renal fibrosis. Recent studies declared that transforming growth factor beta-1 (TGFβ-1) is the major driver of fibrosis in kidney, stimulating a variety of signaling pathways related to deposition of ECM components (2). In spite of enormous researches on the role of TGFβ-1 and downstream elements in the progression of CKD (3,4), few studies have employed holistic and computational methods for investigation of kidney disorders. Among these studies, there is an elegant report presented by Jin et al. (5) who employed gene regulatory network concepts to analyze high-throughput gene expression data. They could predict and experimentally validate HIPK2 as a potential drug target in HIV-associated nephropathy.
In this study, we propose a holistic approach to investigate the molecular interactions and signaling pathways in response to TGFβ-1 stimulation in human kidney cells. A microarray dataset has been generated by Walsh et al. (6) that examines the expression profile of human tubular epithelial cells before and after treatment with TGFβ-1 for 24 and 48 hours. However, they only focused on the few top differentially expressed (DE) genes including GREM1, JAG1 and HES1. They identified Notch signaling as a critical pathway in diabetic nephropathy. In the current study, we introduced a new method for the analysis of the same microarray dataset that integrated the expression pattern of genes with their topological location in the gene interaction network. Using this strategy, we could infer more informative signaling pathways related to TGFβ-1 stimulation. This approach could also be employed for other large data to improve our understanding of biological processes by extracting remarkable concepts from bulk omics data.

Microarray data
This study is a bioinformatics analysis of GSE23338 dataset, originally generated by Walsh et al. (6). mRNA expression profile was downloaded from the Gene Expression Omnibus (GEO) database (7). In this microarray experiment, transcriptional response of human proximal tubule epithelial cells (HK-2) to TGFβ-1 stimulation after 24 and 48 hours was assessed. Using GEO2R tool of GEO, the TGFβ-1 treated cells (24 or 48 hours) were compared to untreated HK-2 cells. Benjamini-Hochberg false discovery rate method was applied for P value adjustment. Genes with adjusted P≤0.05 were considered as differentially expressed.

Protein-protein interaction network
Using CluePedia plugin (8) of the Cytoscape software version 3.1.0 (9), a protein-protein interaction (PPI) network was constructed for the DE genes in time point of 24 hours or 48 hours. Topology of networks was analyzed by the NetworkAnalyzer tool of Cytoscape software.

Pathway enrichment analysis
Pathway enrichment analysis for DE genes was carried out using ClueGO plugin (10) of Cytoscape. In this analysis, KEGG and Reactome databases were chosen for retrieving data and network specificity was adjusted to medium. Bonferroni step down was applied for P value adjustment and pathways with adjusted P≤0.05 were chosen.

Results
In this study, we reanalyzed the GSE23338 microarray dataset assessing mRNA expression profile of HK-2 cells after 24 and 48 hours of treatment with TGFβ-1. Analysis by GEO2R revealed that 110 genes after 24 hours and 170 genes after 48 hours were differentially expressed with adjusted P≤0.05 (Table 1). To investigate the interaction between variably expressed genes, a network was constructed for each time point. Although different kind of interactions (activation, post-translational modification, expression and binding) were allowed to be shown, unexpectedly, few interactions were appeared in both networks (Fig.1A, B). To infer pathways related to the DE genes and understand the down-stream processes controlled by TGFβ-1, pathway enrichment analysis was performed, showing only 12 pathways for 24 hours (Fig.1C) and 10 pathways for 48 hours treatments (Fig.1D), with few connections between the signaling pathways.
The scarcity of interactions in PPI and pathway networks was not unexpected, as they were derived from mRNA microarray data which can only detect genes with altered mRNA level, thus regulated genes at other levels were Central Nodes in PPI Drive Critical Functions missed. Hence, to predict other role players, we enriched both PPI networks by adding one interacting node for each gene. This resulted in expansion from 110 to 199 nodes for 24 hours ( Fig.2A) and from 170 to 301 nodes for 48 hours treatment (Fig.2B). PPI networks were reconstructed with the same parameters applied initially. To determine the most central genes in these enriched networks, their topology was assessed by graph theory measures such as degree, betweenness centrality, and closeness centrality. In each network, the genes were sorted based on each of these features. Then, the top 20% genes in 24 hours treatment and 15% genes with higher rank in 48 hours were chosen. Because of overlapping nodes between the above three centrality parameters, a total of 62 genes in time point of 24 hours ( Table 2) and 60 genes in time point of 48 hours (Table 3) were finally selected. Again, pathway enrichment analysis was performed with either the central genes or the total genes in these two enriched networks. The central genes in time points 24 and 48 hours networks were related to 29 and 49 pathways, respectively (Fig.3). These pathways were strongly related to CKD and formed a deeply connected network in both time points. Interestingly, pathway enrichment analysis with the total enriched networks genes, only determined 16 and 18 pathways for time points of 24 and 48 hours, respectively. These pathways were less inter-connected compared to those derived from the central genes (Fig.4).
Pathway enrichment analysis with the central genes predicted Notch, TNF, P53, Activin and TGFβ signaling as well as platelet-related pathways, affected after TGFβ-1 treatment in both 24 and 48 hours. However, Hippo, PDGF and FGFR signaling pathways were enriched only in the second time point.

Discussion
In this study, we reanalyzed a microarray dataset to determine gene expression alteration in response to TGFβ-1 in a human kidney cell line. The investigators who originally generated this data emphasized the involvement of Notch signaling pathway based on a few DE genes (6). In contrast, we have constructed PPI networks for DE genes in the time points of 24 and 48 hours treatment. We found that expansion of these networks followed by selection of central nodes for pathway enrichment analysis is an efficient method to recognize key signaling pathways in response to TGFβ-1 stimulation. Our analysis also predicted the potential role of some novel pathways in this in vitro model and also pointed out time-dependent activation of particular pathways. Interestingly, the same investigators later repeated the experiment and assessed the mRNA expression profile by RNA-Seq and found that this technique is superior to microarray in identification of the DE genes and altered signaling pathways (11). Noteworthy, the signaling pathways determined by our analysis on the original microarray dataset is similar to the pathways identified with RNA-Seq data.
An interesting finding in this study was that pathway enrichment analysis with the DE genes in the microarray experiment was not efficient for prediction of key signaling pathways. However, it was expected that all important genes were not regulated at the mRNA level and so they were not detectable by mRNA microarrays. Therefore, to compensate for this limitation, we constructed a PPI network of DE genes and then enriched this network by adding genes that were previously known to be interacting with the initial network nodes. This expanded gene set was more informative for detecting signaling pathways. Indeed, it is perfect to perform multi-level assessments in biological experiments, but for practical reasons it is not commonly feasible. In this case, it is possible to measure changes at one level and then make bioinformatics predictions to fill the gaps at other levels.
Several previous studies have shown that highly connected nodes (hubs) in the networks, determined by degree parameter, are vital for the organism survival (12). Next studies revealed that essential genes in the network can be determined not only by degree but also by other centrality parameters, such as betweenness or closeness centrality (13,14). Here, we have used a combination of these three network topology parameters to determine the central nodes. Interestingly, pathway enrichment with these central genes was more informative than enrichment with the initial genes or even with the total genes in the expanded PPI networks. This observation is in line with our recent study on diabetic nephropathy showing the central network nodes tend to be present in signaling pathways and cross talks (15).
In pathway enrichment analysis, Hippo, PDGF, and FGFR signaling pathways were detected only in the second time point, 48 hours treatment. Actually, the initial activation of upstream signaling pathways detected in 24 hours treatment may lead to the expression of genes, related to these three pathways after 48 hours. This finding on time-specific expression of genes underscores the importance of time-course designs for gene expression analysis experiments.
Most of the predicted pathways in our analysis such as Notch, TNF, P53, and TGFβ signaling have been previously known to be involved in the pathogenesis of CKD (16)(17)(18)(19), whereas, for some others, such as platelet degranulation pathway, there is not currently direct experimental proof for participation in renal fibrosis. However, previous experiments have shown megakaryocytes as mediators of fibrosis in a subset of hematologic malignancies, idiopathic pulmonary fibrosis, as well as bone marrow (20)(21)(22). The role of megakaryocytes in kidney fibrosis is an interesting topic for future studies.

Conclusion
We have here employed a holistic approach to assess the consequences of TGFβ-1 stimulation in kidney cells. Although, high-throughput techniques are frequently applied in biological investigations, data interpretation is yet commonly limited to the assessment of most up or down-regulated factors missing the huge effect of interactions for genes with subtle expression change. Systems biology provides novel concepts and methods to infer the underlying mechanisms of biological phenomena from omics raw data and hopefully will bring a higher quality of life to those suffering from chronic diseases.